Description
Malicious apps originate from VirusShare website. This library, which is constantly updated, is one dedicated to providing a large number of malware datasets for security researchers. We downloaded 27127 samples from this web- site dated between July 2014 and September 2016. We collected the network traffic generated by these malicious apps and Finally obtained 18.9 GB traffic. We extracted URL samples from the network traffic. Notably, not all URLs re- quested by malware are malicious, and malicious URLs may only account for a small part. So to let our training set have correct labels, we screened all URLs using the detection report from VirusTotal. Only explicitly malicious URLs were added to the collection of malicious URLs. Eventually, only 11251 explicitly malicious URLs are added to our dataset.
As for the benign data set, we downloaded a total of 6072 apps from multiple third-party application markets (hiapk, wandoujia, and yinyongbao). Similarly, the apps we downloaded from the app markets were not always benign. So we also used VirusTotal to screen these apps. Only apps that VirusTotal confirms benign are added to our benign app collection. And then the traffic-collection platform was used to obtain their traffic data. Ultimately, we obtains 25276 benign URLs from the 14.2GB collected traffic. Although benign URLs are more than twice of malicious URLs, the FCM algorithm sets diffierent weights for samples from diffierent class which helps balance the problem.
Download Link
You can donwload the dataset which we used in our paper from here.